java code
Quality Evaluation of COBOL to Java Code Transformation
Froimovich, Shmulik, Gal, Raviv, Ibraheem, Wesam, Ziv, Avi
We present an automated evaluation system for assessing COBOL-to-Java code translation within IBM's watsonx Code Assistant for Z (WCA4Z). The system addresses key challenges in evaluating LLM-based translators, including model opacity and the complexity of translation quality assessment. Our approach combines analytic checkers with LLM-as-a-judge (LaaJ) techniques to deliver scalable, multi-faceted evaluations. The system supports continuous integration workflows, enables large-scale benchmarking, and reduces reliance on manual review. We describe the system architecture, evaluation strategies, and reporting mechanisms that provide actionable insights for developers and project managers, facilitating the evolution of high-quality, modernized codebases.
TCProF: Time-Complexity Prediction SSL Framework
Hahn, Joonghyuk, Ahn, Hyeseon, Kim, Jungin, Lim, Soohan, Han, Yo-Sub
Time complexity is a theoretic measure to determine the amount of time the algorithm needs for its execution. In reality, developers write algorithms into code snippets within limited resources, making the calculation of a code's time complexity a fundamental task. However, determining the precise time complexity of a code is theoretically undecidable. In response, recent advancements have leaned toward deploying datasets for code time complexity prediction and initiating preliminary experiments for this challenge. We investigate the challenge in low-resource scenarios where only a few labeled instances are given for training. Remarkably, we are the first to introduce TCProF: a Time-Complexity Prediction SSL Framework as an effective solution for code time complexity prediction in low-resource settings. TCProF significantly boosts performance by integrating our augmentation, symbolic modules, and a co-training mechanism, achieving a more than 60% improvement over self-training approaches. We further provide an extensive comparative analysis between TCProF, ChatGPT, and Gemini-Pro, offering a detailed evaluation of our approach. Our code is at https://github.com/peer0/few-shot-tc.
Can LLMs Reason About Program Semantics? A Comprehensive Evaluation of LLMs on Formal Specification Inference
Le-Cong, Thanh, Le, Bach, Murray, Toby
Large Language Models (LLMs) are increasingly being used to automate programming tasks. Yet, LLMs' capabilities in reasoning about program semantics are still inadequately studied, leaving significant potential for further exploration. This paper introduces FormalBench, a comprehensive benchmark designed to evaluate LLMs' reasoning abilities on program semantics, particularly via the task of synthesizing formal program specifications to assist verifying program correctness. This task requires both comprehensive reasoning over all possible program executions and the generation of precise, syntactically correct expressions that adhere to formal syntax and semantics. Using this benchmark, we evaluated the ability of LLMs in synthesizing consistent and complete specifications. Our findings show that LLMs perform well with simple control flows but struggle with more complex structures, especially loops, even with advanced prompting. Additionally, LLMs exhibit limited robustness against semantic-preserving transformations. We also highlight common failure patterns and design self-repair prompts, improving success rates by 25%.
CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution
Jana, Prithwish, Jha, Piyush, Ju, Haoyang, Kishore, Gautham, Mahajan, Aryan, Ganesh, Vijay
In this paper, we present an LLM-based code translation method and an associated tool called CoTran, that translates whole-programs from one high-level programming language to another. Current LLM-based code translation methods lack a training approach to ensure that the translated code reliably compiles or bears substantial functional equivalence to the input code. In our work, we train an LLM via reinforcement learning, by modifying the fine-tuning process to incorporate compiler feedback and symbolic execution (symexec)-based equivalence testing feedback that checks for functional equivalence between the input and output programs. The idea is to guide an LLM-in-training, via compiler and symexec-based testing feedback, by letting it know how far it is from producing perfect translations. We report on extensive experiments comparing CoTran with 14 other code translation tools that include human-written transpilers, LLM-based translation tools, and ChatGPT over a benchmark of more than 57,000 Java-Python equivalent pairs, and we show that CoTran outperforms them on relevant metrics such as compilation accuracy (CompAcc) and functional equivalence accuracy (FEqAcc). For example, our tool achieves 48.68% FEqAcc, 76.98% CompAcc for Python-to-Java translation, whereas the nearest competing tool (PLBART-base) only gets 38.26% and 75.77% resp. Also, built upon CodeT5, CoTran achieves +11.23%, +14.89% improvement on FEqAcc and +4.07%, +8.14% on CompAcc for Java-to-Python and Python-to-Java translation resp.
A Preliminary Analysis on the Code Generation Capabilities of GPT-3.5 and Bard AI Models for Java Functions
Destefanis, Giuseppe, Bartolucci, Silvia, Ortu, Marco
This paper evaluates the capability of two state-of-the-art artificial intelligence (AI) models, GPT-3.5 and Bard, in generating Java code given a function description. We sourced the descriptions from CodingBat.com, a popular online platform that provides practice problems to learn programming. We compared the Java code generated by both models based on correctness, verified through the platform's own test cases. The results indicate clear differences in the capabilities of the two models. GPT-3.5 demonstrated superior performance, generating correct code for approximately 90.6% of the function descriptions, whereas Bard produced correct code for 53.1% of the functions. While both models exhibited strengths and weaknesses, these findings suggest potential avenues for the development and refinement of more advanced AI-assisted code generation tools. The study underlines the potential of AI in automating and supporting aspects of software development, although further research is required to fully realize this potential.
Java and Artificial Intelligence โ The Best Compatible Partners?
Artificial Intelligence programming is a highly complex code that requires varied functionality and coding standard to execute successfully. Thus, there is no single language that can be considered complete for AI projects. Though coders uses high level languages like Python, C, Lisp, Haskel and so on, in many cases Java web development is contemplated as the most commonly loved and used coding language for AI. AI programmers use Java codes to develop machine learning solutions, search algorithms, neural networks, genetic programming and multi robot systems. Although invariantly exposed to high-level languages, what makes AI developers choose Java language over any other language? Here we bring you the truth behind the match.
Amazon CodeGuru: Let machine learning optimize your Java code
Amazon CodeGuru is a recently launched chargeable machine learning service, currently still in preview mode. It was first announced in Andy Jassy's keynote at Amazon's AWS re:Invent 2019 conference that took place on December 2โ6, 2019. The service is comprised of two parts: Amazon CodeGuru Reviewer executes automated code reviews and provides code issue detection, whereas Amazon CodeGuru Profiler searches for ways to improve the application's performance. Amazon CodeGuru was trained on internal Amazon projects as well as more than 10,000 open source GitHub projects. Amazon CodeGuru Reviewer is designed to find issues in code via automatic detection and provide recommendations on resolving them.
Getting to machine learning in production takes focus
Data scientists tend to like to use languages like Python, while production systems run Java. To bridge this gap, Comcast has been building a set of Jython components for its data scientists. Jython is an implementation designed to enable data scientists to run Python apps natively on Java infrastructure. It was first released in 1997 and has grown in popularity among enterprises launching machine initiatives because Python is commonly used by data scientists to build machine learning models. One limitation of this approach is that it can't take advantage of many of the features running on Flink.
Build and Deploy Scalable Machine Learning in Production with Kafka - DZone AI
Intelligent real time applications are a game changer in any industry. Machine learning and its sub-topic, deep learning, are gaining momentum because machine learning allows computers to find hidden insights without being explicitly programmed where to look. This capability is needed for analyzing unstructured data, image recognition, speech recognition, and intelligent decision making. It is an important difference from traditional programming with Java, .NET, or Python. While the concepts behind machine learning are not new, the availability of big data sets and processing power allow every enterprise to build powerful analytic models.